Bark-shift based nonlinear speaker normalization using the second subglottal resonance

نویسندگان

  • Shizhen Wang
  • Yi-Hui Lee
  • Abeer Alwan
چکیده

In this paper, we propose a Bark-scale shift based piecewise nonlinear warping function for speaker normalization, and a joint frequency discontinuity and energy attenuation detection algorithm to estimate the second subglottal resonance (Sg2). We then apply Sg2 for rapid speaker normalization. Experimental results on children’s speech recognition show that the proposed nonlinear warping function is more effective for speaker normalization than linear frequency warping. Compared to maximum likelihood based grid search methods, Sg2 normalization is more efficient and achieves comparable or better performance, especially for limited normalization data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic detection of the second subglottal resonance and its application to speaker normalization.

Speaker normalization typically focuses on inter-speaker variabilities of the supraglottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies have shown that the subglottal airways also affect spectral properties of speech sounds, and promising results were reported using the subglottal resonances for speaker normalization. This paper proposes a reliab...

متن کامل

A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation

In previous work [1], we proposed a speaker adaptation technique based on the second subglottal resonance (Sg2), which showed good performance relative to vocal tract length normalization (VTLN). In this paper, we propose a more reliable algorithm for automatically estimating Sg2 from speech signals. The algorithm is calibrated on children’s speech data collected simultaneously with acceleromet...

متن کامل

Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions

This paper proposes an automatic algorithm for estimating the first two subglottal resonances (SGRs)—Sg1 and Sg2— from continuous speech of children, and applies it to automatic speaker normalization in mismatched, limited-data conditions. The proposed algorithm is based on the observation that Sg1 and Sg2 form phonological vowel feature boundaries, and is motivated by our recent SGR estimation...

متن کامل

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTL...

متن کامل

Age-dependent height estimation and speaker normalization for children's speech using the first three subglottal resonances

This paper proposes an age-dependent scheme for automatic height estimation and speaker normalization of children’s speech, using the first three subglottal resonances (SGRs). Similar to previous work, our analysis indicates that children above the age of 11 years show different acoustic properties from those under 11. Therefore, an age-dependent model is investigated. The estimation algorithms...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009